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I Abstract 

In this paper we consider the problem of computing an mRNA sequence of maximal 
similarity for a given mRNA of secondary structure constraints, introduced by Backofen et 
al. in [BNS02aj denoted as the MRSO problem. The problem is known to be NP-complete 
for planar associated implied structure graphs of vertex degree at most 3. In [BFHV05] a 
^ ■ first polynomial dynamic programming algorithms for MRSO on implied structure graphs 

with maximum vertex degree 3 of bounded cut-width is shown. We give a simple but more 
general polynomial dynamic programming solution for the MRSO problem for associated 
implied structure graphs of bounded clique-width. Our result implies that MRSO is 
polynomial for graphs of bounded tree-width, co-graphs, P4-sparse graphs, and distance 
| hereditary graphs. Further we conclude that the problem of comparing two solutions for 

MRSO is hard for the class Py , which is defined as the set of problems which can be 
solved in polynomial time with a number of parallel queries to an oracle in NP. 

Keywords: graph algorithms, protein similarity search, mRNA structure, computational 
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1 Introduction 



X 

One of the main processes in biology is the transformation of DNA into proteins. This 
process is divided into two steps. The fist step is the transcription, which copies the DNA 
into a certain RNA molecule called messenger RNA (mRNA). Each mRNA is a string of 
four types of nucleotides, i.e. elements of {A, C, G, U}. (A,U) and (C,G) are known as the 
complementary nucleotide pairs. Every string of three nucleotides is called a codon. The 
second step is the translation, which converts block wise a codon in the mRNA into an amino 
acid. Every protein is the result of a translation of some mRNA. 

We can represent every mRNA as a graph by considering its nucleotides as vertices and 
possible edges (so called bonds) between vertices representing complementary nucleotides. 
The resulting graph is also denoted as (secondary) structure graph of the mRNA. If we 
consider the codons as vertices we obtain the associated implied structure graph. 

In this paper consider the MRna Structure Optimization (MRSO) problem, introduced 
by Backofen et al. [BNS02a, BNS02bj. The problem is to compute an mRNA sequence of 
maximal similarity for a given mRNA that additionally satisfies some secondary structure 
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constraints. If the input structure graph of the problem has vertex degree at most one, we 
will denote the restriction of the problem by MRSO-dl. The MRSO-dl problem (and thus 
also the MRSO problem) has shown to be NP-complete for planar implied structure graphs 
[BFHV05J. In [BNS02a] a linear time algorithm for the MRSO-dl problem has been shown 
for outer-planar implied structure graphs. 

A very useful tool to solve hard problems on restricted inputs is parameterized complexity 
[DF99]. The main idea is that only input graphs of a bounded graph parameter k are con- 
sidered. The running time of the algorithms is exponential in k, but for fixed k polynomial. 
Fixed parameter algorithms are frequently used in computational biology, see e.g. works of 
Bodlaender et al. |BDFW95irBDF+95j . In |BFHV05| first fixed parameter algorithms for the 
MRSO-dl problem for implied structure graphs of bounded cut-width are given. There are 
several known secondary (tertiary and quaternary) structures which are not simple, recursive, 
or do not correspond to outer-planar implied structure graphs [AkuOO]. Further it seems to 
be likely that amino acids of more complicated secondary structures will be discovered in the 
future [BFHV05]. Therefore, we give a more general fixed parameter solution for MRSO on 
graphs of bounded clique- width which form a very large class of implied structure graphs. 

The clique-width of a graph is defined by a composition mechanism for vertex-labeled 
graphs [COOOj . The operations are the vertex disjoint union, the addition of edges between 
vertices controlled by a label pair, and the relabeling of vertices. The clique-width of a graph 
G is the minimum number of labels needed to define it. Each such composition leads a 
tree structure. Using this tree structure a lot of NP-complete graph problems can be solved 
by dynamic programming in polynomial time for graphs of bounded clique-width, see e.g. 
ICMR001 [EGWOTI IGW061 IKR03j ■ 

This paper is organized as follows. In Section 2, we recall the definition of clique-width and 
a general method how to solve graph problems on graph of bounded clique-width. In Section 

3, we recall the definition of MRSO from Backofen et al. |BNS02al IBNS02bj . In Section 

4, we show a simple but very general polynomial time solution of the problem for implied 

structure graphs of bounded clique-width. Our result implies that MRSO is polynomial for 

graphs of bounded tree-width, co-graphs, i-4-sparse graphs, and distance hereditary graphs 

and re-proofs the existence of polynomial time algorithms for MRSO-dl of [BFHV05J for 

graphs of bounded tree-width and graphs of bounded cut-width. In Section 5, we briefly 

NP 

conclude that the problem of comparing to solutions for MRSO-dl is hard for the class Pjj , 
which is defined as the set of problems which can be solved in polynomial time with a number 
of parallel queries to an oracle in NP. 

2 Clique-width and polynomial time algorithms 

Let [k] := {1, . . . , k} be the set of all integers between 1 and k. We work with finite undirected 
labeled graphs G = (Vg, Eg, labc), where Vq is a finite set of vertices labeled by some mapping 
labc : Vg —>■ [k] and Eq C {{u, v} \ u, v G Vg, u ^ v} is a finite set of edges. The labeled 
graph consisting of a single vertex labeled by a € [k] is denoted by » a . For the definition of 
special graph classes we refer to the survey of Brandstadt et al. [BLS99] ■ 

The notion of clique- width0 for labeled graphs is defined by Courcelle and Olariu in |CO00] 
as follows. 

1 This complexity measure was first considered by Courcelle, Engelfriet, and Rozenberg CER91, CER93 , 
the notion of clique- width was introduced by Courcelle and Olariu in |CO00j . 
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Definition 2.1 (Clique-width, |CO00] ) Let k be some positive integer. The class CWk of 
labeled graphs is recursively defined as follows. 



1. The single vertex graph •„ for some a G [k] is in CWk- 

2. Let G, J G CWk be two vertex disjoint labeled graphs, then Gffi J := (V, E', lab') defined 
by V' := V G U Vj, E' := E G U Ej, and 



y ' \ labj(u) ifueVj 



is in CWk- 

3. Let a, b G [k] be two distinct integers and G G CWk be a labeled graph, then 

(a) p a ^b{G) ■= (Vg, Eq, lab') defined by 

i Tir \ [ labc(u) if labc(u) ^ a w Tr 
lab'(u) ■=■{, ft, , ; , VueV G 

[ o if labc(u) = a 

is in CWk and 

(b) Va,b(G) := (V G ,E',lab G ) defined by E' := E G U {{u,v} \ u,v G V G , u + 
v, lab(u) = a, lab(v) = b} is in CWk- 

The clique-width of a labeled graph G is the least integer k such that G G CWk- The clique- 
width of an unlabeled graph G = (V G ,E G ) is the smallest integer k, such that there is some 
mapping lab G : V G — * [k] such that the labeled graph (V G ,E G ,lab G ) has clique-width at most 
k. 

A class of graphs C has bounded clique-width if there is some integer k such that any 
graph in C has clique-width at most k, i.e. there is some k such that C C CWk- The minimal 
k, if exists, is defined as clique- width of class C. 

An expression built with the operations « , ©, p a ^b,Va,b fo r integers a, b G [k] is called a 
clique-width k-expression. The graph defined by expression X is denoted by val(X). The 
following two clique-width expressions X\ and X2 define the labeled graphs G\ and G% in 

Fig. m 

Xl = m,2((P2^l(Vl,2( 9 l © •2))) © *2) 



x 2 = pi-» 2 (%,3(((??i,2(»i © » 2 )) © 07i,2(»i e » 2 ))) © * 3 )) 

If a graph G has clique-width at most k then the edge complement G has clique-width at 
most 2k [CO00J. Distance hereditary graphs have clique- width at most 3 [GROO]. Co-graphs, 
i.e. i-4-free graphs have clique- width at most 2 [COOOj. Further, many graph classes defined 
by a limited number of P4 have bounded clique-width, e.g. i-4-sparse graphs, i-4-tidy, and 
(q, £)-graphs [CMROOj IMR99| . The clique- width of permutation graphs, interval graphs, grids 
and planar graphs is not bounded [GROQ]. An arbitrary graph with n vertices has clique- 
width at most n — r, if2 r < n — r. Every graph of tree- width at most k has clique- width 
at most 3 • 2 fe ~ 1 |CR05| . The recognition problem for graphs of clique-width at most k is 
still open for k > 4. Clique-width of at most 3 is decidable in polynomial time [CHL + 00 . 
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Figure 1: Two labeled graphs G\ and G2 defined by expressions X% and X2, respectively. 
The inscriptions in the circles represent the labels of the vertices. 

Clique- width of at most 2 is decidable in linear time }CPS85] , Both algorithms also give a 
clique- width expression if the input graph has clique-width at most 3 or clique- width at most 
2, respectively. The clique-width of tree-width bounded graphs is also computable in linear 
time [EGW03] . Minimizing clique- width is NP-complete [FRRS06) . 

Courcelle et al. have shown in [CMROO] that all graph properties which are expressible in 
monadic second order logic with quantifications over vertices and vertex sets (MSOi-logic) are 
decidable in linear time on clique-width bounded graphs. Furthermore, there are many NP- 
complete graph problems which are not expressible in extended MSOi-logic like Hamiltonicity, 
chromatic number, partition problems, and bounded degree subgraph problems but which can 
also be solved in polynomial time on clique-width bounded graphs. The algorithms can be 
found in [EGWOll IGW0 6 , KR03J . The proofs are based on the following general dynamic 
programming scheme. 

Theorem 2.2 ([EGW01]) Let U be a graph problem and k be a positive integer. If there 
is a mapping F that maps each clique-width k-expression X onto some structure F(X), such 
that for all clique-width k-expressions X, Y and all a,b € [k] 

1. the size of F(X) is polynomially bounded in the size of X, 

2. the answer to II for val(X) is computable in polynomial time from F(X), 

3. F(» a ), is computable in time 0(1), 

4- F(X Y) is computable in polynomial time from F(X) and FiY), and 

5. F(rj a i)(X)) and F{p a ^,i ) {X)), are computable in polynomial time from F(X). 

Then for every clique-width k-expression X , the answer to II for graph val(X) is computable 
in polynomial time from expression X . 

One of the main important questions is how to find clique-width expressions. For graphs 
of clique-width at most 3 an expression can be found in polynomial time, as stated above. 
For graphs of larger clique-width, approximations of rank- width [O S061 IOum05l [Oum06j lead 
approximations of clique-width and a corresponding expression. The best known result is the 
following. 

Theorem 2.3 ([Oum06]) For every fixed integer k there is a 0(\Vg\ 3 ) algorithm that either 
outputs a clique-width (8 k — 1)- expression of an input graph G, or confirms that the clique- 
width of G is larger that k. 



4 



3 MRna Structure Optimization (MRSO) 



In this section we recall the MRna Structure Optimization (MRSO) problem as introduced 
by Backofen et al. in |BMS02a| . 

A codon is a sequence of three nucleotides, i.e. a string of {A, C, G,U} 3 . UAA,UAG and 
UGA are called stop codons, the remaining codons represent 20 amino acids. An mRNA is a 
sequence of n consecutive codons S = si . . . s 3n over {A, C, G, U}, i.e. each codon of S is of 
the form S3j_2.s3i-i.s3i f° r some 1 < % < n. 

The MRna Structure Optimization (MRSO) problem is defined in [BNS02aJ as follows. Let 
S = S\ . . . S 3n be the nucleotide sequence of an mRNA and let A = A\ . . . A n be a given amino 
acid sequence. The problem is to find an approximative mRNA sequence N = N\ . . . N 3n with 
amino acid sequence A' = A[ . . . A' n , such that N and S have the same secondary structure 
and A and A' are of maximum similarity. The similarity between amino acid sequences is 
measured by PAM matrices introduced by Dayhoff et al. |DSQ78| . We will use n functions 
fi, 1 < i < n measuring the similarity between Ai and A\. 

In order to define the MRSO as a general graph problem we use the following notions. Let 
E be a finite alphabet (in biological application E = {A, C, G, U} corresponds to the set of 
nucleotides) and T C E x E be a set of complementary pairs over E (in biological application 
T = {(C, G), (A, U)} corresponds to the set of complementary nucleotide pairs). We denote 
the complement of some X G E by X. For some mRNA with nucleotide sequence S, we 
define the structure graph of S by taking the nucleotides as vertices and edges between any 
two vertices representing complementary nucleotides. 

That is, to solve the MRSO problem we have to compute an admissible labeling over 
E (i.e. a labeling that satisfies the complementary conditions) for the vertices of the given 
structure graph of highest possible value with respect to functions fi, i = 1, . . . , n. 

Problem 3.1 (MRSO) 

INSTANCE: A structure graph G = ({«i, ■ ■ ■ , vs n }, Eg), and n functions f\, . . . , f n , fi : E 3 — ► 
Q is associated with {v^i-2, v^i—i, v%i}, 1 < i < n. 

OUTPUT: A function L : Vq — > E, such that {vk,vi} S Eq implies that (L(vk), L(vi)) E T 
and the cost 

n 

MRSO(G, fi,...J n ):=J2 fi(L(v 3i - 2 ),L(v 3i -i),L(v 3i )) 

i=l 

is maximized. 

In several motivations from biology, the structure graph of problem MRSO has vertex 
degree at most one. Following the notions of [Bon04] . we denote the corresponding problem 
by MRSO-dl. 

Since functions fi, i = 1, . . . , n correspond to n amino acids, we next describe the MRSO 
problem on the amino acid level instead of the given nucleotide level definition. For some 
structure graph G = ({i>i 5 • • • , v 3n }, Eq) we define the implied structure graph G- im ^\ = 
(^impb ^impl) W ^Gjmpi = {ui,...,u n } and #G impl = {{ui, uj} \ 3r G {3i -2,3i- 1, 3i} : 
3s € {3j — 2, 3j — 1, 3j} : {v r ,v s } E Eg}. Fig. [2]shows an example for a structure graph and 
the corresponding implied structure graph. 

Next we generalize the complementary conditions given by T for amino acids, i.e. strings of 
E 3 . Let (hi-2ki-iki,hj-2hj-ikj) E E 3 xE 3 be a pair and v 3i - 2 , v 3 i-i, v 3i , v 3j ^ 2 ,v 3 j-i,v 3j be 
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Figure 2: A structure graph G and the corresponding implied structure graph Gj m pp Vertices 
corresponding to one codon are framed in a grey box. 

the six corresponding vertices of structure graph G. We define pair (l^^hi-ihi, hj-2hj-lhj) 
satisfies F, if for every edge {vi>, vj>} € Eg, Si — 2 < i! < Si, 3 j — 2 < f < Sj: (k',lji) £ T. 

Obviously, every solution for the MRSO problem on a structure graph G can be trans- 
formed into a solution for the corresponding implied structure graph G- im p\, and vice versa. 
Further for every structure graph G which is an instance of problem MRSO-dl, every vertex 
in Gj mp i has at most 3 adjacent edges as shown in the example of Fig [2 

The following results for the MRSO-dl problem have been shown. Problem MRSO-dl 
is known to be NP-complete for implied structure graphs with page number at most 2, see 
|BFHV 05] . and thus for planar implied structure graphs, further in [Bon04] it is shown that 
MRSO-dl generalizes the Maximum independent set problem for graphs of vertex degree at 
most 3, which is also known to be NP-complete |GJ79j . Even the decision problem, where an 
input graph G and n functions fx, ■ ■ ■ , f n are accepted if some assignment of the vertices reach 
costs of c is NP-complete for implied structure graphs of vertex degree at most 3 [BNS02a . 

If the implied structure graph is outer-planar, MRSO-dl is solvable in linear time [BNS02a] . 
Further in [BFHV05J polynomial fixed parameter algorithms for MRSO-dl for implied struc- 
ture graphs of a bounded number of edge crossings, implied structure graphs of a bounded 
number of degree 3 vertices, and implied structure graphs of a bounded cut-width (which also 
implies a polynomial solution for MRSO-dl on tree-width bounded graphs) are given. 

Since there also exist mRNA structures with bonds between more than two nucleotides 
[AkuOO] and amino acids of more complicated secondary structures [BFHV05| . we next give 
a more general solution for problem MRSO for implied structure graphs of bounded clique- 
width. 



4 MRSO on implied structure graphs of bounded clique-width 

We next will use the scheme of Theorem 12.21 to obtain a polynomial time solution for the 
MRSO problem for associated implied structure graphs of bounded clique-width. 

Theorem 4.1 For every positive integer k, problem MRSO can be solved in polynomial time 
for every structure graph that defines an implied structure graph which is given by some 
clique-width k-expression. 

Proof Let G be a structure graph for the implied structure graph Cq m pi = ({ui, ...,«„}, Eg, 
labfj), which is defined by some clique-width /c-expression X. For every admissible labeling 
lab s : V G -> X 3 of val(X) we define a pair (L,f), where L = {(lab va jp^ lab s (n)) | u € 
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^val(x)} — W x ^ 3 an d / = Yl u eval(x) fi(^ a ^° T '( u i))- Let F(X) be the set of all mutually 
different pairs (L, f) for all admissible labelings of the vertices of graph val(X) with labels 
of S. Then F(X) is polynomially bounded in the size of X, because F{X) has at most 
(|V| — ' k ■ |V|' S ' mutually different pairs. Each pair contains a label set with at most 
|£| 3 • k different pairs of [k] x £ 3 and a sum of at most |E| 3 different addends. 

The following observations show that for every fixed integer k, F (•<,), a G [k], is com- 
putable in time O(l), F(X©y) is computable in polynomial time from F(X) and F(Y), and 
F(r]a,b(X)) and F(p a _>b(X)), a,b G [fc], are computable in polynomial time from FpT). 

1. If val(X) consists of a single vertex Uj, then 
F(.«) = {({(a,!)}, /<(*)) l^£ 3 } 

2. F(X © Y") is the set of all pairs [L U L', / + /') which can be obtained by a pair 
(£,/) G F(X) and a pair (£',/') G F(Y). 

3. F(r? 0]6 (X)) = {(L, /) G | (a, h), (b, l 2 ) G L (Zi, Z 2 ) satiesfies T} 

4. F(/o a _» b (Jf)) = {({(p a ^fe(ai), Zi), . . . , (p _^ b (a TO ), Z m )}, /) | ({(oi, Zi), . . . , (o m , Z m )}, /) G 

There is an admissible labeling of the vertices of vsX(X) with cost / if and only if there is 
some pair (L, /) G F(X). The corresponding labeling of the vertices of val(X) from X 3 can 
be recomputed from expression X. By Theorem 12.21 the results follows. □ □ 

By Theorem 12.31 we conclude our main result of this section. 

Theorem 4.2 MRSO is computable in polynomial time for every class of structure graphs 
that define implied structure graphs of bounded clique-width. 

Since every class of graphs of bounded tree-width has bounded clique-width |CR0 5] . our 
result implies that even the MRSO problem can be solved in polynomial time for structure 
graphs which define implied structure graphs of bounded tree-width which has been shown 
in |BFHV05j for the MRSO-dl problem. 

Note that our solution is independent of alphabet E and set of complementary pairs T, it 
is only important that S has a bounded size. 



5 Comparing two solutions of MRSO-dl 

In this section we consider for two given implied structure graphs G±, G%, and two sequences 
of similarly functions /j, g^, 1 < i < n, the complexity of comparing the costs of the corre- 
sponding two solutions of problem MRSO-dl. We will show that these compare problems are 

NP 

even complete for the complexity class Pn , which is assumed to be a strong super set of 
NP 

NP. Class P|| is defined as the set of problems which can be solved in polynomial time with 

NP 

a number of parallel queries to an oracle in NP. For more results concerning Pjj -hardness 



sec 



We next assume the restricted case that S = {a, b, a, b} and T = {(a,a),(b,b)}, see 
|Bon04j . 
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Problem 5.1 (Comparing (Equality) MRSO-dl) 

INSTANCE: Two structure graphs G\ = {{v\, . . . , v^ n }, E\) and G 2 = ({ui, . . . , u^m}, E2), n 
functions fa, . . . , f n , fa : X 3 — > Q, and m functions g\, . . . , g m , gi : 

QUESTION: Is MRSO-dl{G x , fa, ...,/„)< MRSO-dl(G 2 , gi, ...,g m )? 

(QUESTION: Is MRSO-dl^, fa, ...,/„) = MRSO-dl(G 2 , g x , ...,g m )?) 

NP 

Theorem 5.2 Comparing MRSO-dl and Equality MRSO-dl is Pi -complete for planar 
graphs. 

Proof First we have to show that Comparing MRSO-dl and Equality MRSO-dl is contained 
NP 

in P|| . Therefor, we define a polynomial time algorithm solving the problem Comparing 
MRSO-dl with a number of parallel queries to an oracle in NP. We take the MRSO-dl problem 
as our oracle, which is in NP. Given two graphs G\ and G2, we ask the oracle the following two 
queries: MRSO-dl (Gi, fa, ... , f n ) and MRSO-dl {G 2 , gi, ■ ■ ■ ,g n )- We accept for the problem 
Comparing MRSO-dl if both values are equal. Analogously we can define parallel queries to 
an oracle in NP for the problem Equality MRSO-dl. 



In SVOO] the problem of comparing the maximum vertex cover of two graphs has been 



shown to be pjj^-complete. Using the reductions of [GJS76] Theorem 2.7 and [GJ77j Lemma 

NP 

1 we conclude that comparing the maximum vertex cover of two graphs remains Pjj -complete 

for planar graphs of vertex degree at most 3. 

Since every vertex cover C C V of a graph G = (V,E), obviously corresponds to an 

independent set V — C in graph G, we conclude that comparing the maximum independent 

NP 

set of two graphs remains Py -complete for planar graphs of vertex degree at most 3. 

NP 

This allows us to show the Pn -hardness of Comparing MRSO-dl and Equality MRSO-dl 
by a reduction from comparing independent set for planar graphs of vertex degree at most 3 by 
the idea of the proof of Theorem 3 in [Bon04| . Given an instance G of maximum independent 
set of vertex degree at most 3, the proof constructs an instance (Gj mp i, fa, ... , /„) for problem 
MRSO-dl, such that MRSO-dl (G- im pi, fa, . . . , f n ) is equal to the maximum independent set 
of G. □ □ 
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